-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generic FlightTableFactory with a default FlightSqlDriver #11938
Conversation
cc06778
to
c38db1c
Compare
69d6239
to
820021b
Compare
Thank you @ccciudatu I have been thinking recently that we need to figure out what to do with
There is a tension for adding integrations into datafusion-cli as it makes the overall project that much more complicated @stuartcarnie and I were thinking of writing up a proposal to make a I plan to try to write this up / maybe try and make an example repo or something |
@alamb The I will squash the whole thing in a single commit and restore the original title . |
820021b
to
25b9903
Compare
25b9903
to
eeae003
Compare
I added the proto marshalling and marked the PR "ready for review", once again. |
eeae003
to
98e9b82
Compare
FWI I wrote up my thoughts here; #11979 I hope to review this PR later today One thing that might be useful to ask is "does this code need to be in the datafusion repo, or could it be put in another repo -- like for example https://github.com/datafusion-contrib/datafusion-table-providers 🤔 We could make you your own repo if you wanted |
98e9b82
to
f9d28a4
Compare
@alamb Got it, makes sense. Separately, I promise I'll stop force-pushing updates now that I know it's being reviewed. I may push some fixups though... :) |
I found my embarrassing mistake and added a fixup commit. |
// The bearer token has to be passed to the clients that perform | ||
// the DoGet operation, since Dremio, Ballista and possibly others | ||
// expect the bearer token they produce with the handshake response | ||
// to be set on all subsequent requests, including DoGet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Remove this "stolen" client implementation and submit a PR to arrow-rs to make the token visible in the official client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb on average, my PRs are of reasonable size. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate it @ccciudatu -- I am sorry I haven't had a chance to comment more thoroughly on this PR yet...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries, @alamb! The fact that you are even considering reviewing such a huge patch makes you one of the nicest people I know. 🙃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is nice of you to say -- the whole point of doing this in open source is to get other people to contribute things and a key way to do that is to actually look at their proposals 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, thank you @ccciudatu for this contribution. It si
TLDR is I recommend we put this code into a separate repo and indeed have made one here: https://github.com/datafusion-contrib/datafusion-table-provider-flight
The rationale is that since this code uses publically available APIs so there is no technical reason it must be in the DataFusion core crate that I can see
You raise an excellent point about a pre-integrated binary being easier to use with datafusion-cli
, and I agree such a pre-integration would be useful to users
However, given how much is already in the DataFusion core crates I think adding more features at this time is likely not a great idea.
Instead, as I wrote down, I think it would be ideal if this provider (along with others) could be pre-integrated into a single binary via something like #11979
I left some suggestions along the way in case it is helpful to you
/// Collects and reports back config entries. Only used to persuade `datafusion-cli` | ||
/// to accept the `flight.` prefix for `CREATE EXTERNAL TABLE` options. | ||
#[derive(Default, Debug, Clone)] | ||
pub struct FlightOptions { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The configuration system is another thing I think that a focused standalone binary would be able to do better than DataFusion's key=value system
/// .insert("CUSTOM_FLIGHT".into(), Arc::new(FlightTableFactory::new( | ||
/// Arc::new(CustomFlightDriver::default()) | ||
/// ))); | ||
/// let _ = ctx.sql(r#" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is really cool
impl FlightMetadata { | ||
/// Provide custom [PlanProperties] to account for service specifics, | ||
/// such as known partitioning scheme, unbounded execution mode etc. | ||
pub fn new(info: FlightInfo, props: PlanProperties, grpc: MetadataMap) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is really cool -- we actually implemented some version of this custom for InfluxDB 3.0 for services that exchange data between themselves
|
||
use crate::datasource::flight::{FlightDriver, FlightMetadata}; | ||
|
||
/// Default Flight SQL driver. Requires a `flight.sql.query` to be passed as a table option. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the configuration story for coinfiguring datasources like this is sorely lacking in datafusion-cli
It would be kind of cool to support something like
create crednetials for flightsql as ....
select * from 'flightsql://myhost/SELECT%20*%20FROM%20foo'
Thank you, @alamb . I will move this to the new repo and close it from here. |
Makes sense -- thank you for your understanding |
Moved to its own repo: datafusion-contrib/datafusion-table-provider-flight#1 |
@ccciudatu Just seeing this now - I'm one of the maintainers of https://github.com/datafusion-contrib/datafusion-table-providers. I think it would be useful to have a single crate of community maintained TableProviders and I think this would fit nicely in there, and I could make you a maintainer on that repo as well. We've also done work to make the table providers in there play nicely with https://github.com/datafusion-contrib/datafusion-federation, which could also be good for this implementation. Also happy if you want to keep it in a separate crate/repo, we can link to it from README.md. |
You are right, @phillipleblanc, I enjoyed having my own repo for a while, but I soon started to feel lonely. :) |
Which issue does this PR close?
Closes #7731
Rationale for this change
Create external tables backed by Arrow Flight(SQL) services.
What changes are included in this PR?
FlightTableFactory
that can integrate any Arrow Flight RPC service as aTableProviderFactory
, expecting aFlightDriver
implementation to handle theGetFlightInfo
flow.FlightSqlDriver
implementation registered asFLIGHT_SQL
.Are these changes tested?
yes
Are there any user-facing changes?